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ESTIMATING LINEAR FUNCTIONALS IN NONLINEAR 
REGRESSION WITH RESPONSES MISSING AT RANDOM 

By Ursula U. MiiLLER 

Texas A&M University 

We consider regression models with parametric (linear or nonlin- 
ear) regression function and allow responses to be "missing at ran- 
dom." We assume that the errors have mean zero and are indepen- 
dent of the covariates. In order to estimate expectations of functions 
of covariate and response we use a fully imputed estimator, namely an 
empirical estimator based on estimators of conditional expectations 
given the covariate. We exploit the independence of covariates and 
errors by writing the conditional expectations as unconditional expec- 
tations, which can now be estimated by empirical plug-in estimators. 
The mean zero constraint on the error distribution is exploited by 
adding suitable residual-based weights. We prove that the estimator 
is efficient (in the sense of Hajek and Le Cam) if an efficient esti- 
mator of the parameter is used. Our results give rise to new efficient 
estimators of smooth transformations of expectations. Estimation of 
the mean response is discussed as a special (degenerate) case. 

1. Introduction. Consider a regression model Y = r^{X) +e with linear 
or nonlinear regression function depending on a finite-dimensional param- 
eter "i? in some open set. Assume that the covariate vector X and the error 
variable e are independent and that Ee = 0. Note that we do not make any 
further model assumptions on the distributions of the variables. We are in- 
terested in the situation where the response Y is missing at random, in other 
words, we always observe X but only observe Y in those cases where some 
indicator Z equals one, and the indicator Z is conditionally independent of 
Y given X. 

We want to estimate the expectation Eh{X,Y) of some known square- 
integrable function h from a sample {Xi, ZiYi, Zi), i = 1, . . . ,n, for example, 
the mean response, higher moments of y or X or mixed moments. If all 



Received December 2007; revised July 2008. 

AMS 2000 subject classifications. Primary 62J02; secondary 62N01, 62F12, 62G20. 
Key words and phrases. Semiparametric regression, weighted empirical estimator, em- 
pirical likelihood, influence function, gradient, confidence interval. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2009, Vol. 37, No. 5A, 2245-2277. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



U. U. MULLER 



indicators Zi were 1, a simple consistent estimator would be the empiri- 
cal estimator X^iLi ^(^i; ^ related estimator for the missing data 
situation considered here would be 

with 7r(X) denoting an estimator of the conditional probability 7r(X) = 
P(Z = 1|X) = E{Z\X). Another estimator is the partially imputed estimator 

1 " 

- Y,{Zih{Xi,Yi) + (1 - Z,)x{X,)}, 

n ^ 

1=1 

where x{X) is a (semiparametric) estimator of the conditional expectation 
x(^) = E{h{X,Y)\X}. An alternative to this estimator is the fully imputed 
estimator J27=i xi^i) ■ 

If a nonparametric estimator x is used, we expect all three estimators to 
be asymptotically equivalent. For h{X,Y) = Y and the last two estimators, 
this is sketched in Cheng (1994). Here we assume a specific form of the con- 
ditional distribution of Y given X, and we can construct better estimators 
than the nonparametric ones. We then expect the fully imputed estima- 
tor X]r=i to be better than the partially imputed one, which in 
turn should be better than the first estimator. For parametric models this 
is shown for h{X,Y) = Y hy Tamhane (1978) and Matloff (1981). Miiller, 
Schick and Wefelmeyer (2006) show for several regression models (not in- 
cluding the present one) and arbitrary h that the fully imputed estimator is 
usually better than the partially imputed estimator. That the same holds for 
the nonlinear regression model considered here is intuitively clear: our model 
E{Y\X) =r^{X) constitutes a structural constraint. The fully imputed es- 
timator, based on estimators x{X) that use the structure, will therefore be 
better than the partially imputed estimator, which uses this information 
only at data points where responses are missing. 

In this article we study the fully imputed estimator based on suitable 
estimators for xi^) ^^'i show that it is efficient. The construction is as fol- 
lows: in a first step we exploit the independence of covariates and errors and 
the structure of the regression model and write the conditional expectation 
xix) = x{x,'&) as an unconditional expectation of the error distribution, 

x{x,^)=E{h{X,Y)\X = x} 

= Eh{x,r^{x) +e} = Eh{x,r^{x) + Y - r^{X)}. 

This representation suggests an empirical plug-in estimator based on the 
observed data, namely 

n n 

xix, ^)=Y^ Z,h{x, r^{x) + Yj - r^iXj)] /j] Z„ 
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where 'd is an estimator of t?. The corresponding fully imputed estimator is 

(1-1) -X.x(x.^)--1^ 

It is straightforward to check that x{x,i!)) is consistent for Eh{x,r^{x) + 
e} [which yields consistency of J2i=iX{-^ij'^)j with i? consistent]; note 
that xlx,!^) tends in probability to E[Zh{x,r^{x) +£}]/EZ with EZ = 
E{E{Z\X)} = E7r{X). Now use the missing at random assumption and the 
independence of X and e to rewrite the numerator, 

E{E[Zh{x,ri){x) + e}\X]) = EiE{Z\X)E[h{x,r^{x) + e}\X]) 

= E[TT{X)Eh{x,r^{x) + e}] 

= ETr{X)Eh{x,ri,{x) + e}. 

The limit of x{x,-&) is therefore xi^^'^) = Eh{x,r^{x) + e}. 

The estimator (1.1) is well thought out and consistent. However, it is not 
yet efficient, even if an efficient estimator for i? is used (which is relatively 
elaborate in the model considered here; see Section 5): we focus on the 
common situation where the errors have mean zero; this information must 
also be incorporated in order to obtain efficiency. 

Motivated by Owen's empirical likelihood approach, we improve the above 
estimator by introducing weights which use the mean zero constraint on the 
error distribution. However, and in contrast to the original approach, we can- 
not observe the errors and must use residuals. This clearly complicates the 
situation: since we have missing responses the residuals are partially incom- 
plete and, moreover, they involve parameter estimates -d. Formally, we choose 
weights Wj based on residuals ij = Yj — r^{Xj) such that J2]=i ^j^j^j — 0- 
(See Section 3 for more details.) 

Our final estimator now is a weighted version of the above fully imputed 
estimator, namely 

,^_l^ E"=i^,^.M^.,r-^(^.)+>S-^^(^.)} 

The combination of full imputation methods (involving estimators of un- 
conditional expectations of the error distribution) with empirical likelihood 
ideas provides a new methodology which has not appeared in the literature 
before. We show in this article that n~^Y^^=iXw{Xi,'&) is efficient if an ef- 
ficient estimator {} for •& is used. The partially imputed estimator will in 
general not be efficient, even if •& is efficient for i?. 

For estimation of the mean response, that is, if h[X,Y) = Y , which is 
of particular interest and typically considered in the literature, the esti- 
mator simplifies to the straightforward estimator Yll=i''^ ■ That the 
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unweighted estimator (1.1) for EY cannot be efficient is immediately appar- 
ent: consider the case where all responses are observed. Here (1.1) reduces 
to the empirical estimator ^7=1 ^ which does not use the regression 
structure at all. It will be seen that its influence function is not the efficient 
one. (See Section 6 for details.) 

Our efficiency results are based on the Hajek-Le Cam theory for locally 
asymptotically normal families. As a consequence, our proposed estimators 
have a limiting normal distribution with the asymptotic variance determined 
by the influence function. It is therefore straightforward to construct asymp- 
totic confidence interval for Eh{X,Y) (see Section 6.3). 

In addition, estimators for smooth (continuously differentiable) transfor- 
mations of expectations Eh{X,Y) are also now available, with the variance 
of the response, YaicY = EY^ — E'^Y, as an important example. Since effi- 
ciency is preserved by smooth transformations, plugging in efficient estima- 
tors yields an efficient estimator of the transformation. The transformation 
for Vary in terms of the first two moments is (EY, EY'^) i-^ EY"^ — (EY)"^. 
Plugging in n~^J2i'=i''^^(.-^i) the weighted fully imputed esti- 

mator for EY'^ (which is straightforward to compute and is also given in 
Section 6) gives an efficient estimator of the variance. 

To our knowledge, our estimator (1.2) is the first efficient estimator for 
arbitrary linear functionals Eh{X,Y) (including the mean functional EY) 
in the nonlinear regression model (including the linear regression model 
Y = -d"^ X + e) with independent centered errors when responses are missing 
at random. Matloff (1981) considers estimation of the mean EY in a model 
related to ours, the (parametric) conditional mean model, E(Y\X) = r^(X), 
which can (but need not) also be written in the form Y = r^{X) + e with 
conditionally centered errors, E{e\X) = 0. He shows that the average of the 
estimated regression function values (with his estimator •d of i?) improves 
upon the partially imputed estimator. Wang and Rao (2001) consider lin- 
early constrained covariates and develop an empirical likelihood approach 
for inference about the mean in linear regression (with independent errors) 
based on partial linear regression imputation. In Wang and Rao (2002) they 
present an empirical likelihood approach for inference about the mean re- 
sponse in nonparametric regression, based on partial kernel regression impu- 
tation as suggested by Cheng (1994). A different empirical likelihood method 
for this setting is proposed by Qin and Zhang (2007). Wang (2004) assumes a 
parametric model for the conditional density of Y given X, with constraints 
on the covariate distribution, and introduces a weighted partial imputation 
estimator for the mean, utilizing empirical likelihood techniques. Wang, Lin- 
ton and Hardle (2004) consider a partially linear regression model for the 
conditional mean function and derive inference tools for the mean response 
based on a class of asymptotically equivalent (partially and fully imputed) 
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estimators. A related article is Liang, Wang and Carroll (2007) who addi- 
tionally assume that covariates are measured with error. Chen, Fan, Li and 
Zhou (2006) consider partially imputed estimators for the mean response in a 
quasi-likelihood setting. Maity, Ma and Carroll (2007) estimate expectations 
in semi-parametric regression models, with and without missing responses. 
They consider a general regression function involving a parametric and a 
nonparametric part, thus covering the partly linear model, and assume that 
the likelihood function given the covariates is known. 

For estimating expectations, little attention has been given to the fully 
imputed estimator. We anticipate that in many situations, in particular in 
models with structural assumptions, improved estimators can be obtained 
by using appropriate full imputation instead of partial imputation estimates. 

Inference for missing data has been studied by many authors, also recently. 
Chen and Wang (2009) study estimation of parameters which are defined 
by model constraints. They introduce an empirical likelihood approach in- 
volving estimating equations, where missing variables are replaced using a 
nonparametric imputation approach. Chen, Hong and Tarozzi (2008) con- 
sider parameter estimation as well. They introduce efficient estimators for 
parameters in GMM models with missing data, and assume that the miss- 
ingness can be explained by auxiliary variables. More references to recent 
literature can be found, for example, in Wang, Linton and Hardle (2004) 
and in the monograph by Tsiatis (2006). For an introduction, see Tsiatis 
(2006) and the books by Little and Rubin (2002) and Gelman et al. (1995). 

This paper is organized as follows. In Section 2 we derive a stochastic 
expansion of the unweighted estimator. The expansion of the weighted es- 
timator is given in Section 3, utilizing the results of Section 2. Section 4 
characterizes efficient estimators of arbitrary functionals of the joint distri- 
bution and gives the efficient influence function of the functional Eh{X,Y) 
in the nonlinear regression model. In Section 5 we characterize efficient es- 
timators for the parameter vector -i? and briefly sketch the construction of 
such an estimator. In this section we also show our main result, that the 
weighted estimator with an efficient estimator i? for plugged in is effi- 
cient for Eh{X,Y). Section 6 contains a short discussion of special cases 
such as estimation of the mean response. We also compare, using computer 
simulations, the efficient (weighted fully imputed) estimator with the other 
approaches, with convincing results. For these studies we considered a linear 
and a nonlinear regression function and estimation of two simple function- 
als, namely of the response mean and second moment, for which the efficient 
(weighted fully imputed) estimator simplifies, and estimation of a more com- 
plicated expectation. We also briefly sketch the construction of confidence 
intervals. 
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2. Expansion of the unweighted estimator. In this section we derive an 
expansion of the unweighted estimator J27=i xi^i^'^)^ which is a special 
case of the weighted estimator J2i'=iXw{Xi,i}) with all weights being 
equal to one, Wj = 1. This can be regarded as a result of independent interest 
since the estimator (with an appropriate estimator •&) would be relevant for 
regression models where the errors cannot be assumed to have mean zero. 
Also, we will see in the next section that the weighted estimator can be 
written as the sum of the unweighted estimator and an additional correction 
term. Hence we can utilize the results later when we derive an expansion of 
the weighted estimator. 

Throughout this paper we will assume that Y is square integrable and 
that the error variance Ee"^ = is nonzero and finite. We also suppose 
that the error distribution has a Lebesgue density / and finite Fisher in- 
formation, E£'^{e) < oo, where £ denotes the score function for location, 
£{e) = — f {e) / f {e) . The degenerate case that we (almost surely) never ob- 
serve a response Y will be excluded by assuming P{Z = 1) = EZ > 0. The 
following assumptions will also be required. 

Assumption 1. The regression function Ti-^rT-(x) is difi'erentiable at 
r = "!? with a p-dimensional square integrable gradient r^(x) which satisfies 
the Lipschitz condition 

Irr^x) — r^{x)\ < |t — i?|a(3;), square integrable. 

Later we will also need that the covariance matrix of an efficient parameter 
estimator i? [which involves the covariance matrix of r^{X) and the Fisher 
information] is invertible. 

Now use a Taylor expansion to see that 

n 



1=1 

if 

i=i 



= E 

1=1'-'"' 

n I 

1=1 
n 

<|r-i9|4^a2(X,). 



){Xi)-r4Xi)\^du 



1=1 

Assumption 1 therefore guarantees that the function t rr{X) is stochas- 
tically differ entiable, that is, for each constant C, 

n 

(2.1) sup 5]K(Xi)-r^(X,)-r^(Xi)^(r-^)}2 = Op(l). 

|T-i?l<Cn-l/2 -^^ 
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We will not need the first partial derivative of h{x, y), d/dxh{x, y). There- 
fore we will write h' for the second partial derivative, h'{x,y) = d2h{x,y) = 
d/dyh{x,y). 

Assumption 2. The function h{x,y) is differentiable in y with a square 
integrable partial derivative h'{x,y) = d/dyh{x,y) which satisfies the Lips- 
chitz condition 

\h'{x,z) —h'{x,y)\ < \z — y\b{x,y), b{X,Y) square integrable. 

In the following Z will denote the average of the indicators Zi, Z = 
n~^J27=i^i- The next lemma gives the expansion of the estimator around 
the true parameter t?. 



Lemma 2.1. Assume that Assumptions 1 and 2 hold and that -d is a y/n 
consistent estimator of 'd. Then the unweighted estimator has the expansion 



-j^ n 1 " 



(2.2) - ^ ^) = - E xiXi,^) + D^{^-^)+ Op(n-i/2) 
1=1 1=1 

with D = E{h{X,Y)[ri){X) - E{ri){X)\Z = l}]£{e)). 



Proof. For reasons of clarity we introduce the notation 

= h{Xi,r4X,) + Yj - r^X,)} 
and write fij for the gradient. Then 

1 

n 



1=1 



i=l j=l 



1 1 " 

Zn'^^, 

1=1 



n 



J2 Zjh{Xi,r^{Xi) + Yj - r^{Xj)} + Zih{Xi,Y) 



(2.3) 



3=1 



1 1 " ( " 

2;^j:\Ezjfm+zrh{x,,Y,) 

i=l (. 3=1 

i=l 3 = 1 

37^» 

n n 



1 11 

E xix.,^) + ^-i E E zdnm - hm- 



n ^ Z n . , . , 

1 = 1 1 = 1 3 = 1 

3#j 
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Below we will show that 

1 1 n n 

(2-4) E E ^dfm - = D^i^ - ^) + oAn^'^') 

1=1 J = l 

with D = {EZy^E[Z2h'{Xi,ri){Xi) + Y2-r4X2)}{UiXi)-r4X2)}].That 
this D is indeed of the form given in the lemma can be seen as follows. Con- 
sider 

D = E[h'{Xi,r^{Xi) + e2}r4Xi)] 

- -^E[h'{Xi,rff{Xi) + £211(^2 = l)rdX2)]. 

The first term can be written E{E[h'{Xi,r^{Xi) + e2}\Xi]r^{Xi)). Inte- 
gration by parts of the inner integral gives E[h'{Xi,r^{Xi) -|- e2}|^i] = 
E[h{Xi,r^{Xi) + e2}iie2)\Xi]. The second term is E[h'{Xi,r^{Xi) + 
e2}]E{r^{X)\Z = 1}. We proceed analogously and, in conclusion, obtain 

(2.5) D = E{h{X, Y)[r^{X) - E{r4X)\Z = 1}]%)). 

The result now follows from (2.3), (2.4) and (2.5). It remains to verify (2.4). 
The proof consists of two parts, 



1 1 n n 

(2-6) ^- E E Zj{M^) - h0) - ki^Vi^ - ^)} = Op{ 

« = 1 3 = 1 

^ ^ n n 

(2-7) E E Z,h0V{^ -^) = D^0-^) + o,(n-V2). 

t=l J = l 

Statement (2.7) can be quickly proved: since t? is ^/n consistent we can 
replace the gradient by its expectation, 

1 1 n n 

i;^EE^.4w^(^-^) 

i=l 3 = 1 

37^» 

n n 



E E E{Z,A,m^{S - ^) + o,(n" V2, 

Zj fl . . , 

« = 1 3 = 1 



[n 



= ^E{Z2h2mV{^ - ^) + o,(n-V2) 

with {EZ)~'^E{Z2fi2{'&)} = as given in (2.4). For the proof of (2.6) it 
suffices to show that 

2 



E 

1=1 



1 



3 = 1 
3#» 



Opil). 
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This holds by the following arguments. Rewrite the above expression and 
apply the Cauchy-Schwarz inequality to obtain 

n / , n „i 

E E iki^ + ^(^ - ^)} - km''0 - ^) du 

n n .1 

<EE^^i^-^i' / \M^+n{^-^)}-km'dn. 

i=l .=1 ^0 

The difference \fij{-d + u{'d - ??)} - fiji-d)]^ is bounded by |i? - times 
a square integrable function Aij. This holds due to Assumptions 1 and 2, 
namely the Lipschitz conditions on and h' and since a{X),b{X,Y),r^{X) 
and y) are square integrable. Summing up, the expression is bounded 
by li? — i?!^ X^iLi Sj=i,j^i which is stochastically bounded since t} is ^/n 
consistent. □ 



We will now replace the estimated conditional expectation x in the right- 
hand side of (2.2) by the true one. Set 

1 n n 1^ 

^ = ;x^EE^M^.r,(x,) + y,-r,(x,)}. 

We have 

ltxiX^,^) = ^S + 0,(n-i) = S- ^^ES + o,(n- V2) 

i=l 

and, by the Hoeffding decomposition, 

i=l 2=1 

with h{e) = E{h{X,Y)\e}, ES = Eh{X,Y) = Eh{e). Combining the above 
yields 

Ti . ^ Tl . Tl . III Zj 

i=\ 1=1 i=\ 

This and Lemma 2.1 give our expansion for the unweighted estimator which 
we formulate as a corollary. 

Corollary 2.2. Assume that Assumptions 1 and 2 hold and that •& 
is a \fn consistent estimator of Then, with D = E{h{X ,Y)[r,ff(X) — 
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E{r^{X)\Z = l}]^(e)) and h{e) = E{h{X,Y)\e} , the unweighted estimator 
has the expansion 



1 



n 



1=1 



1 " 

-E 

n ^ 

1=1 



Zi 



xiX„^) + ^{hiei)-Ehiei)} 



+ i?T(i?-??)+0p(n-i/2)_ 



3. Expansion of the weighted estimator. In this section we study the 
weighted estimator which uses residual-based weights, wj, that are con- 
structed by adapting empirical likelihood techniques. The approach is to 
maximize 0^=1 subject to the mean zero constraint on the error distri- 
bution, J2]=i ''JJj^j^j = 0; with Wj > and J2]=i '^j = The weights solving 
this optimization problem are given hy Wj = 1/{1 + XZjij), where A denotes 
the Lagrange multiplier — provided A exists. As shown by Owen (1988, 2001), 
this is the case if not all residuals have the same sign, that is, on the event 
mini<j<„ej < < maxi<j<„ej, which has probability tending to one since 
the residuals ij are uniformly close to the centered errors ej [see (A.l) in 
the Appendix]. If A does not exist, we set A = 0. Note that the weights equal 
one if Zj = or X = 0. For computational issues we refer to Section 2.9 of 
Owen's book (2001). 

The formula for the weights can be written as an identity, wj = 1 — 
XwjZjij. This enables us to decompose the estimator into the unweighted 
estimator and an additional correction term, 

-in 1 " 

n ^ n ^ 

1=1 t=i 

(3-1) 

n n ^ 

- EE Wjej^h{Xi,r^{Xi) + ej}. 

Since we have already derived an expansion of the unweighted estimator 
(see Corollary 2.2) we only need to study the second term on the right-hand 
side. In Lemma 3.1 we will derive an expansion of the estimated Lagrange 
multiplier A and use this result in Lemma 3.2, where we determine an ap- 
proximation of the extra term. For the proof of Lemma 3.1 we proceed anal- 
ogously to Owen (2001), pages 219-221 [compare also Miiller, Schick and 
Wefelmeyer (2005)]. This requires some auxiliary results which are proved 
in the Appendix, namely 

(3.2) max I ZjEj I = Op (n^/^), 

l<i<n 
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- V Z.Si = - V Zie, - EZE{U[X) | Z = 1}^ - ^) + Op(n-i/2) 
n n 

(3.3) 

(3.4) - Ziil = -Y, Ziei + Op(l) = ^Zd^ + Op(l), 
1=1 1=1 

where '& is a ^Jn consistent estimator of -d and cr^ > the error variance. 

Lemma 3.1. Suppose that Assumption 1 is satisfied and let ^ he a ^Jn 
consistent estimator ofi!). Then maxi<j<„ Itij — 1| =Op(l) and 

1 1 " 7- 1 

^ = -- E - -E{r^{X)\Z = 1}^(^ -^) + o,(n-V2) 

(3.5) 

= 0p(n-V2). 

Proof. We first derive the order of A. Recah that Wj = 1/(1 + XZjij), 
that Wj + XwjZjij = 1 and that J2^=i ''^j^j^j = by construction. Also note 
that the Zj 's are binary and that therefore Zj = Zj . This allows us to write 

2^ n 1 " 1 ^ 

- E ^J^j = - E(^i + ^^'<^j^A)^j^j = E "^i^i^l 



n-{l + XZjij 

Note that 1 + XZjij > since the weights are positive. Then 
-. n -,71 z.f2 

1^1- E ^.^1 = I A| - E TTiT-^i + 



1 n Z ■ f 2 

^ 1^1 - E , , 'ci . ( 1 + |A| max 



n 



tl + AZ/,V i<i<- 



.11" / 
= \X\ — —y ZjiA 1 + \X\ max iZj-e, 
Xnf^^-'^K i<j<n ■> ^ 

The last equality holds due to (3.6). Applying (3.2), (3.3) and (3.4) to the 
first and last terms of the inequality we obtain |A| • Op(l) = Op(n~^/^) + 
|A|op(l) which implies X = Op{n~^^'^). This and (3.2) give maxi<j<„ | A.Zjf j| 



12 



U. U. MULLER 



Op(l) and therefore our first statement, 



max \wi — 1 = max 

1<J<" l<j<n 



-XZjij 



Op{l). 



1 + XZjij 

We now again make use of (3.6) and write 

j=i I j=i j=i ) j=i 

For the last statement we utiUzed (3.4), maxi<j<„ jwj — 1| = Op(l) and A = 



1 1 1 



J2Zjij + 0p{r 



-1/2N 



Inserting approximation (3.3) for n ^J2]=iZj^j finally yields the desired 
approximation of A. □ 

Lemma 3.2. Suppose that Assumptions 1 and 2 are satisfied and let 
be a \/n consistent estimator of Then, with h{e) = E{h(X,Y)\e} , 

A " " Z 

E E -jHXi, {Xi ) + ij } 

i=lj=l 

= ^- E ^^^E{eh{e)} - ^E{eh{e)}E{U{X)\Z = 1}^ - ^) 

4 = 1 

+ Op(n-'/'). 

Proof. Since A = Op(n~^/^) and maxi<j<„ \wj — 1| = Op(l) by the pre- 
vious lemma, and since maxi<j<„ \Ziei\ = Op{n^^'^) by (3.2), it is clear that 
the terms of the sum where j = i, that is, h{Xi,r^{Xi) + ei} = h{Xi,Yi), can 
be ignored. It therefore suffices to prove the statement for 

A Z 

E E ^^(^i) + ^j} 
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with 



1 Z ■ 



where ip is ip^ with Wj = 1. The second part involving the difference Wj — 1 
is Op(n~-^/^), which can be seen as follows: using A = Op(n-i/2) and 
maxi<j<„ \wj — 1| = Op{l) we obtain 

EZ 1 Z 



EZ 



< I A| i max l^^^i - 1| ;^ E E \ijh{Xi,r^{Xi) + 



i j^i 



This gives the claimed rate Op{n~^/'^) since the sum is bounded in probability, 
which follows from the ^/n consistency of and Assumptions 1 and 2 on 
the terms of the product {Y2 - rr{X2))h{Xi,rr{Xi) + Y2 - rr{X2)] . 

It remains to consider XEZ/Zipid). Using \ = Op[n~^/'^) we can replace 
ip{'Q) by ip^d) since il:{'d) — = Op(l), which again follows from Assump- 
tions 1 and 2 and the consistency of ??. Further, by the law of large numbers, 
EZ/Z = 1 + Op(l) and ijj{"9) - Eip{^) = Op(l). These arguments yield 

z 



The expected value of ^(i?) is 



n — 1 



n 



e2—h{Xi,r^{Xi) +82} 



11 — 1 71 — 1 — 

-E{eh{XX)} = E{eh{£)}. 



n 



n 



Summing up, 
EZ 



Z 



Inserting expansion (3.5) for A into the above completes the proof. □ 



Combining the previous lemma and the approximation of the weighted 
estimator from Section 2 gives an expansion for the weighted estimator. 
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Theorem 3.3. Suppose that Assumption 1 and 2 are satisfied and that 
"d is a ^Jn consistent estimator ofiD. Let h{e) = E{h{X,Y)\e} . Then 



1 



n 



1=1 



i=l 



EZ 



h{ei) - Eh{£i) 



E{eh{e)} 



where = E{h{X,Y)[r^{X) - E{U{X)\Z = l}]£{e)) + a^^E{£h{e)}x 
E{r4X)\Z = l}. 



Proof. Consider the two terms of representation (3.1) and replace them 
by their approximations given in Corohary 2.2 and Lemma 3.2. This yields 



1 



n 



i=l 



n 



+ 



i=l 



Zi 



h{ei) - Eh{e) 



E{eh{e)] 



1 



+ ^E{eh{£)]E{r^{X)\Z = 1} 



(i?-??)+0p(?i-i/2) 



vjiih D + a^'^E{eh{£)]E{fif{X)\Z = 1} = D^, by definition of D (see Corol- 
lary 2.2). Inserting this into the above gives the desired representation. □ 



4. Efficiency. We are interested in efficient estimation of Eh{X, Y) based 
on observations {X, ZY, Z) . Our estimator requires an efficient estimator of 

In this section we determine the influence function of an efficient estimator 
of Eh{X,Y). In the next section, where the influence function of an efficient 
estimator i? of is determined, this allows us to show that the fully imputed 
estimator with an efficient "i? plugged in is efficient. Throughout we will 
suppose that the assumptions made earlier are satisfied. 

We first calculate the efficient influence function for estimating an arbi- 
trary functional n of the joint distribution P{dx,dy,dz). The joint distri- 
bution depends on the marginal distribution G{dx) of X, the conditional 
probability 7r(x) of Z = 1 given X = x, and the conditional distribution 
Q{x, dy) of Y given X = x, 

P{dx, dy, dz) = G{dx)B^^^) {dz){zQ{x, dy) + (1 - z)6o{dy)}. 

Here Bp =p6i + {1 — p)6o denotes the Bernoulli distribution with parameter 
p and 6t the Dirac measure at t. In a first step we consider a nonparamet- 
ric model for P, that is, we allow for arbitrary models for G, Q and vr. For 
this general setting a characterization of efficient estimators of K{G,Q,7r) 
is in Miiller, Schick and Wefelmeyer (2006), Section 2. In the following we 
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summarize their key arguments and apply them to the special case of non- 
linear regression (which is not considered in that article). We then calculate 
the efficient influence functions for estimating Eh{X,Y) in the nonlinear 
regression model and, in the next section, for estimating i?. 

For the characterization of efficient estimators it is essential to first intro- 
duce the notion of tangent spaces. The tangent space of a model is the set 
of possible perturbations of P within the model. An estimator of a certain 
functional is, roughly speaking, efficient if its influence function equals the 
so-called canonical gradient of the functional, which is an element of the tan- 
gent space. Hence, in order to characterize the efficient influence function, 
we first need to determine the tangent space. 

Consider (Hellinger differentiable) perturbations of G, Q and vr. 



To guarantee that the perturbed distributions are probability distributions 
requires that the (Hellinger) derivative u belongs to 



with M{dx,dy) = Q{x,dy)G{dx), and that w belongs to L2{Gt^)^ where 
GTt{dx) = Tr{x){l — IT {x)}G{dx). The perturbed joint distribution Pnuvw then 
has derivative tuvwix, zy, z) = u{x) + zv{x,y) + {z — 7r{x)}w{x). Note that 
models for G, Q and tt will result in further restrictions on the perturba- 
tions which must satisfy the model assumptions. Then u,v and vr must be 
restricted to subspaces U of L2,o(G), V of Vq and W of L2{G-,^). 

In this article we make no model assumptions on G and vr and thus 
have U = L2fi{G) and W = L2{Gt^). Since we are considering nonlinear 
regression we do, however, have a model for the conditional distribution, 
namely Q{x,dy) = f{y — r^{x)}dy with / denoting the (mean zero) den- 
sity of the error distribution. Perturbations v oi Q must therefore satisfy 
/ v{x,y)f{y — r^{x)} dy = 0. In order to derive an explicit form of V, we 
introduce perturbations s and t of the two parameters / and "d. Write F 
for the distribution function of / and remember that we assume that / has 
finite Fisher information for location, El'^{e) < oo, where i = —f'/f is the 
score function. The perturbed distribution Q now depends on s and t. 




G{dx){l + n~^'^u{x)], 
Q{x,dy){l + n~^/^v{x,y)}, 
S^(^)(dz)[l + n-^'^{z - Ti{x)}w{x)]. 




that V belongs to 




Qnvix,dy) =Qnst{x,dy) = fns{y - r{)^,{x)] dy 
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with §nt = ^ + n~^/H, t G W, = f{y){l + n~'^/'^s{y)} and seS, where 

5=|sGL2(F): I s{y)f{y)dy = 0,J ys{y)f{y)dy = Oy 

Note that the space S is determined by two constraints: the perturbed error 
density fns must integrate to 1, / fns{y)dy = 1, and must be centered at 
zero, / yfns{y)dy = 0. As in Schick (1993), Section 3, we have 

fns{y - r^ntix)} 

= f{y - r^nt + n-^/2s{y - r^„, (x)}] 

= [f{y - M^)} - n-^/^f'{y - r^{x)}r^{x)^t] [1 + n-^'^s{y - r^{x)}] 
= f{y - r4^)} (l + \s{y - r^(x)} - ^2/ - M^)} .^(^)T^ 

V L f{y-r^{x)} 

= f{y - Mx)}{l + n-^/'^[s{y - r^x)} + £{y - r^(x)}r^(x)^t]). 
Therefore 

Qnst{x,dy) = f{y -r^{x)}dy 

X (1 + n-^/^[s{y - r^ix)} + £{y - r^(x)}r^(x)^t]) 
and the subspace V of Vq is 

(4.1) V = {v{x, y) = s{y - r^(x)} + l{y - r^{x)]fi){xy t ■.s£S,t£ 



We now briefly review some definitions. We will do this for arbitrary 
subspaces U,V and W of L2fl{G), Vq and L2{G-,^), and then return to our 
specific situation. 

Let T denote the tangent space consisting of all derivatives tuvw A func- 
tional K of G, Q and vr is called differentiahle with gradient g G L2{P) if, for 
ah G [/, u e y and w , 

n^^'^ {K{Gnui Qnvi T^nw) ~ ^(G, Q, Vr)} 

(4.2) 

^ E{g{X, ZY, Z)tuvw{X, ZY, Z)]. 

The (unique) canonical gradient = g^:{X,ZY,Z) is the projection of g{X, 
ZY, Z) onto the tangent space T. It is easy to check that T can be written 
as an orthogonal sum of three subspaces, 

T = {u{X) : n G [/} © {Zv{X, Y) -.v eV} ® {{Z - n{X)}w{X) : u; G VF}. 

The random variable (7* {X, ZY, Z) is therefore the sum {X) + Zv.^ {X, Y) + 
{Z -■k{X)}w^{X), where «*(X), Zv^{X,Y) and {Z - ■k{X)}w^{X) are the 
projections of g{X, ZY, Z) onto these subspaces. 
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An estimator k for k, is regular with limit L if L is a random variable 
such that for all u £U, v £V and w S W, 

n I \k K(Gnui Qnvj "^nw)} ^ L Under Pnuvw 

The Hajek-Le Cam convolution theorem says that L is distributed as the 
sum of a normal random variable N, with mean zero and variance Eg'^, 
and some independent random variable. This justifies calling an estimator 
k efficient if it is regular with limit L = N. As a consequence, a regular 
estimator is efficient if and only if it is asymptotically linear with influence 
function 17*, that is, 

n 

n^^lk - k{G, Q, vr)} = n"^/^Y.9*iX^, Z,Yi, Zi) + Op{l). 

1=1 

A reference for the convolution theorem and the characterization is Bickel 
et al. (1998). 

Let us now specify the canonical gradient for the functional Eh{X,Y). 
The canonical gradient is, in particular, a gradient and thus specified by 

(4.2) . Moreover, it is characterized by g^{X, ZY, Z) = u^{X) + Zv^,{X,Y) + 
{Z — ■K{X)}w:t:{X) with the terms of the sum being projections as stated 
above. The canonical gradient for arbitrary k is therefore determined by 

E{u4X)u{X)} + E{Zv4X,Y)viX,Y)} 

(4.3) +E[{Z -tt{X)}^w,{X)w{X)] 

= lim n^/'^{K{Gnu,Qnv,T^nw) - k{G,Q,-k)}. 

n — ^00 

In the nonlinear regression model we have, as defined earlier, U = L2fi{G), 
W = L2{G^), Qnv = Qnst with v£V, that is, v{X, Y) = s{e) + £(e)r^(X)^t 
[see (4.1)]. Since Eh{X,Y) does not depend on tt we have Eh{X,Y) = 
K{G,Q,7r) = k{G,Q) and 

Eh{X,Y)= J hclM = J J h{x,y)Q{x,dy)G{clx) 



Let Mnuv{dx,dy) = Qnv{x,dy)Gnu{dx) with Qnv = Qnst = fnsiy-r^nti^)} dy 
and perturbations f^s and i?„f as defined earlier. Using the previous 

approximations we see that the right-hand side of (4.3) is 

l\m^n^/^(^J hdMnuv- J hdM^ = E[h{X,Y){u{X) + v{X,Y)}] 
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with v{X,Y) = s{e) +e{e)r^{Xyt. The canonical gradient of Eh{X,Y) 
is therefore determined by 

E{u4X)u{X)} + E{Zv4X,Y)v{X,Y)} 

(4.4) 

+ E[{Z - TT{X)}'^w,{X)wiX)] = E[h{X, Y){u{X) + v{X, Y)}] 

for all u&U, V GV and w GW with v of the above form. 

In order to specify we set u = and = in (4.4) and see that w^: 
must be zero. Setting v = 0, we see that u*(X) is the projection of h{X,Y) 
onto U = L2,o(G), that is, u^{X) = x{X,^) - E{x{X,^)} with x{X,^) = 
E{h{X,Y)\X}. Hence we have 

(4.5) g, (X, ZY, Z) = x(X, ^) - E{x{X, ^)] + Zv, {X, Y) 

and are left to determine f^,. Taking u = in (4.4), we see that the projec- 
tion of Zv^{X,Y) onto V = {v{X,Y) w G V} must equal the projection of 
h{X,Y) onto that is, onto 

V = {s{e) + l{e)r^{X)^t, s£S,te W}. 

There are two possible ways to obtain . One method would be to make an 
educated guess: in Theorem 3.3 we derived an approximation of an estimator 
of Eh{X, Y) which we expect to be efficient since it uses all information 
about the model. The approximation still involves — but, combined with 
the efficient influence function for estimating -i? (which is relatively easy 
to derive; see Section 5), it will suggest a candidate for v^^. Whether this 
candidate is the correct -u* can be checked with characterization (4.4), that 
is, with 

E[Zv,{X,Y){sie)+£ie)r4X)^t}] = E[h{X,Y){s{e)+i{e)rAxVt}]. 

(4.6) 

The other method uses the structure of the tangent space. The canonical 
gradient t;* is characterized in terms of projections onto V. Its derivation 
as a projection onto V is simplified by decomposing V. Let ig denote the 
projection of £ onto S, 

and note that = is possible, namely when the error density / is normal. 
We now introduce the notation 

C = MX) - E{r4X)\Z = l}]£{e) + E{r4X)\Z = 1}^ 

and, for s G 5 and t G M^, write 
s(e)^H{xYt£(E) 

= s{e) + t'^MX) - E{r4X)\Z = l}]£{e) 
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+ t'^E{r4X)\Z = l}!^i{e) " ^} + E{r4X)\Z = 1}^ 

= t^C + sie) + t'^E{r4X)\Z = l}4(e) 

with s{e) +t'^E{U{X)\Z = l}4(e) G S. Any element of V can therefore be 
written t^C + s(^) foi' some i G and s G S". Since the canonical gradient 
is in V by definition, it must be of the form 

v4X,Y) = s*{e)+f^C 

with s* € S and t* G to be determined such that (4.6) holds, that is, after 
our above considerations, 

E[Z{s*{e)+t*^C}{s{e)+t'^C}] = E[h{X,Y){s{e)+t'^C}] 

for all teW and se S. 

We first consider t = and secondly s = and, in both cases, use the fact 
that Z(^ is orthogonal to S. Then the above characterization of s* and t* 
reduces to two equations, namely 

(4.7) E{Zs*{e)s{e)} =E{h{X,Y)s{e)} for all s G 5, 

(4.8) E{Zt*^Ct'^C} = E{h{X, Y)t'^C} for all t G W. 

Consider (4.7) and again use the notation h{e) for the conditional expecta- 
tion E{h{X,Y)\e}. Then (4.7) can be written as E{Zs* {e)s{e)} = E{h{e) 
s(e)}, that is, h{e)/EZ is an obvious candidate for s* . However, it is not 
(yet) in S: the desired s* is obtained as its centered version with a correction 
term chosen such that s* G 5", 



The vector t* is obtained by solving (4.8), t*'^ E{ZCC'^)t = E{h{X,Y)C}t 
for all t & Now use the definition of C from above and the definition of the 
vector Dm from the end of the previous section, = E{h{X,Y)[r^{X) — 
E{r^{X)\Z = l}]i{e)) + a~^E{eh{e)}E{f^{X)\Z = 1}, and assume that 
E{Z((^~^) is invertible to obtain 

f T = E{h{X,Y)C}E{ZCC)-^ 

= El^h{X, Y) {[i-^{X) - E{r^{X)\Z = l]V l{e) 

+ E{f^{X)\Z = lY^]]E{Z(:C'~^ 

= DlE{ZCC)-\ 



a2 
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This completes the derivation of Y) = s*{e) + t*^(: 



(4.9) v4X,Y) = ^ 



h{e) - Eh{e) 



E{eh{e)} 



Equations (4.5) and (4.9) together finally yield the canonical gradient g^^ 
which is given in the following lemma. Note that we now have the addi- 
tional assumption that E{ZQ(^^ is invertible, where E{ZC,(^^ involves the 
covariance matrix of Zf(X) and the Fisher information E^ie). 

Lemma 4.1. Let h{e) = E{h{X,Y)\e}, ( = [r^{X)-E{U{X)\Z = l}]^(e) + 
a-^E{r4X)\Z = l}e and = E{h{X,Y)[r^{X) - E{r^{X)\Z = l}]i{e)) + 
a~'^E{eh{e)}E{r^{X)\Z = 1} = E{h{X,Y)C}. Suppose additionally to the 
model assumptions from Section 2 that E{ZC,Q^) is invertible. Then the 
canonical gradient of the functional Eh{X,Y) is 



(4.10) 



xiX,^)-E{xiX,^)} 



+ 



z 

'ez 



h{e) - Eh{e) 



E{eh{e)] 



+ D:,E{ZCCr^ZC. 



5. Estimation of the parameter and main result. In this section we show 
that the weighted estimator for Eh{X,Y) with an efficient estimator for 
'd plugged in is asymptotically linear with influence function equal to the 
canonical gradient, that is, it is efficient. Let us compare the expansion of 
the weighted estimator from Theorem 3.3 and the efficient influence function 
which is given by the canonical gradient (4.10) in Lemma 4.1. The approxi- 
mation of n-^/^J2i=i[Xw{Xi,^) - E{x{X,{})}] which we derived in Section 
3 is 



n 



+ 



Zi_ 
EZ 



h{ei) - Eh{e) 



E{eh{e)} 



+ Dln^l\{}-^), 



where = E{h{X,Y)[ri){X) - E{r^{X)\Z = l}]^(e)) + a-'^E{eh{e)} x 
E{r^{X)\Z = 1}. The efficient influence function determined by the canon- 
ical gradient is 



x{X,d)-E{x{X,'d)} 

z 



+ 



EZ 



h{e)-Eh{e) 



E{eh{e)} 



^DlE{zcCY'K 



with C = \ri){X) - E{rii{X)\Z = l}]£(e) + a~'^E{fii{X)\Z = l}e. Using an 
estimator "d with influence function E{Z((~^)^^Z( would therefore yield an 
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efficient estimator for Eh{X,Y). In fact, it is easy to check (this will be 
done in the following lemma) that this influence function is the canonical 
gradient of the functional k{G,Q,tt) = This means that our estimator of 
Eh{X,Y) requires an efficient estimator for to be plugged in in order 
to be efficient. 



Lemma 5.1. Let C = [r^iX) - E{r^{X)\Z = l}]i{e) + a~'^E{r^{X)\Z = 
l}e and suppose that E{ZC,C^) is invertible. An asymptotically linear esti- 
mator "d for "d with influence function E{ZC,C^)~'^ ZQ, that is, 



: n 



-1/2 



Y^EiZCQ'r^Z, 



1=1 



{U{Xi)-E[r^{X)\Z = l]}i{e^) 



+ E{r^{X)\Z = l]- 



is efficient for 



Proof. We have a semiparametric model for the conditional distribu- 
tion, namely Q{x,dy) = f{y — r^{x))dy, and nonparametric models for G 
and vr. The functional t9 G is therefore a functional of Q, K{G,Q,7r) = 
k{Q) = By the discussion of the previous section we must show that the 
influence function of the estimator equals the canonical gradient, which is, 
for arbitrary functionals k, determined by (4.3). For the functional i? the 
right-hand side of (4.3) is simply n^/^{(i9 -|- n~^/^t) — •&} = t. From Section 
4 we also know that in the nonlinear regression model any u in y is of the 
form v{X,Y) = s{e) + t'^C, where s G 5 and t G JR. The canonical gradient 
u^{X) + Zv^{X, Y) + {Z — 'it{X)}w^{X) is therefore characterized by 

E{u4X)uiX)} + E[Zv4X, Y){s{e) + Ct}] 

+ E[{Z - 7t{X)}^w^{X)w{X)] = t. 

Taking s = 0, t = and -u; = we see that = 0. Analogously one obtains 
that w-f must be zero. The canonical gradient thus reduces to Zv^{X,Y). 
Again, since G V , we write Zvif{X,Y) = Zs*{e) + ZQ^t* with s* and t* 
to be determined. Taking t = we see that Zv^ must be orthogonal to S, 
that is, s* = which yields Zv^,{X,Y) = Z(^t* . The above characterization 
therefore reduces to 

t = E[zCt*{s{£) + Ct]] = t*^E{ZCC)t for ah t G M. 

This gives t* = E{ZC,Q^)'~'^ and the proof is complete: the canonical gradient 
of the parameter i? is Zv^{X,Y) = Zt*^C = E{ZCC)~^ZC. □ 
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Note that the asymptotic variance of i? is E{Z(('^)~^ . The assumption 
that E{ZC^(^'^) must be invertible is therefore a condition on the covariance 
matrix of an efficient estimator of i? which we require to have fuh rank. 
Lemma 5.1 combined with the previous discussion yields our main result, 
which is given in the following theorem. Note that the asymptotic variance 
of the fully imputed estimator of Eh{X,Y) is Eg'^, where g'* is the canon- 
ical gradient from (4.10). This variance is also given in the theorem below 
and is easily verified by taking into account that the three terms of g^, are 
orthogonal. 

Theorem 5.2. Assume that Assumptions 1 and 2 hold and that the 
covariance matrices of r^{X) and of Zr^{X) are invertible. Let ■& be an 
asymptotically linear estimator of 'd with influence function E{Z((^~^)~^Z(^ 
where C = [r^{X) - E{r^{X)\Z = l}]£(e) + a-^E{r^{X)\Z = l}e. Then the 
estimator n~^J2i=iXwiXi,'d) with x^„(Xj,'i9) = YJ^^^WjZjh{x,r^{x) + Yj - 
r^{Xj)} / J2]j=i has the expansion 



1 " / 7 

1=1 



7/ N ^7/ ^ E{£h{e)} 
h{ei) - Eh{e^) - ^^^e, 



+ DlE{ZCCr^Z, 



X [r^{Xi) - E{r^{X)\Z = 1}] + E{r^{X)\Z = 1}- 

-l/2^ 



a 



+ Op{n 

where D^, = E{h{X,Y)[r^{X) - E{ri){X)\Z = l}]£{e)) + a-^E{eh{e)} x 
E{r^{X)\Z = 1} and hie) = E{h{X,Y)\e} . In particular, it is an efficient 
estimator of Eh{X,Y) and asymptotically normally distributed with asymp- 
totic variance 

Ex\X, ^) + —Eh\e) + —]E^h{X,Y) - ^""^f^^^^ 
^ ^ ' ' EZ ^ ' \ EZ J ^ ^ a^EZ 

+ DlE{ZCCr'D^. 

In the linear regression model without missing responses, efficient estimators 
for 'd have been constructed by Bickel (1982), Koul and Susarla (1983) and 
Schick (1987, 1993). Schick (1993) considers general regression models with 
arbitrary sets of identifiability assumptions and discusses the mean zero con- 
straint on the error distribution as an important example. His construction 
of an efficient estimator requires a preliminary estimate of and a direct 
estimator of the influence function. The influence function for the nonlinear 
regression model with mean zero errors [see Schick (1993), Section 4.1 and 
Remark 3.13] is E{i(^)~^iWii\ii = [f^{X)-E{ri}{X)]]l{e)+E[i-^{X)]e/(j'^ 
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and therefore consistent with our findings. A further developed efficient es- 
timator, which requires weaker conditions, is in Forrester et al. (2003). In 
the model with missing responses an efficient estimator can be constructed 
analogously, using only the (available) full observations. Note that the only 
difference in the construction is that the data are incomplete, that is, the 
presence of indicators Zj. In the following we will briefly sketch this "one- 
step improvement" construction of the estimator and refer to Forrester et 
al. (2003) for details. 

Let i9 denote a \/n consistent and discretized estimator of that is, 
with values on a rectangular grid with side lengths of order n"^/^. Write 
n{'d) for E{r^{X)\Z = 1}, e{'d) for the error variables e(t?) = Y — r^{X) and 
C^{X,e{'&)} for C, that is. 

In order to estimate the influence function one replaces the unknown quan- 
tities by estimators. The estimator of is then of the form 

-1 

^+ '— - 

where 



Y^Z.QiX^e.im^iX^eji^)'^} 



with 

M(^?) = v^n ^ > CT (t9) - 



and an estimator i of the score function. To describe this estimator let k 
be a kernel that satisfies the assumptions given in Section 8 of Forrester et 
al., for example, a logistic density. For a bandwidth a„ — > we set kn{x) = 
k{x / ttn) / OLn- The estimator of the score function £ is a kernel estimator based 
on the available residuals e('^), 



hn + fn{x) 



with fn{x) = J2]=i Zj^n{x — £]{'&)} and where 6„ is a sequence of pos- 
itive numbers converging to zero. The orders of — > and 6ri — > (which 
also apply if only a fixed fraction of the n data pairs is observed) are given 
in Forrester et al. (2003). 

There are other simple estimators for i9 available which, however, and in 
contrast to the estimators proposed by Schick (1987, 1993) and Forrester 
et al. (2003), are not efficient for •& and which, if used for plug-in, would 
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yield inefficient estimators of Eh{X,Y). One could, for example, estimate 

by a weighted least squares estimator, that is, by the solution t = ■& 
an estimating equation J27=i ^i''^ti-^i){^i ~ ^t(^i)} = 0- Such an estimator 
would be appropriate in a regression model where independence of errors 
and covariates cannot be assumed. Then one could even obtain efficiency for 
suitably chosen weights [see Miiller (2007), for nonlinear regression without 
missing responses]. The estimating equation can be regarded as an empiri- 
cal version of the equation E[Zwt{X){Y — rt{X)}] =0. If a solution t = '& 
of this equation exists, the solution i? of the empirical version will, in gen- 
eral, be consistent for t?. If one is not interested in efficiency, the estimator 
n~^J27=iXw{^i,i^) with a least squares estimator plugged in would yield 
a consistent estimator for Eh{X,Y) (but not an efficient one since the inde- 
pendence structure is not used). Alternatively, the least squares estimator 
can be used as a preliminary estimator for the one-step improvement ap- 
proach sketched above. 

6. Special cases, simulations and inference. Sometimes the estimator 
simplifies considerably, especially if we study simple special cases such as 
estimation of expectations Eh{X,Y) where h has a simple form. The main 
result from Theorem 5.2 is therefore useful in proving efficiency of existing 
approaches for specific applications, or in improving them, and for com- 
parisons of competing methods. Theorem 5.2 further provides the limiting 
distribution of the efficient estimator, which facilitates the construction of 
confidence intervals. We will address this and aspects of the construction of 
estimators in the following, and illustrate the results with simulations. 

6.1. Special cases. We have shown that the fully imputed weighted esti- 
mator n~^J2i'=iXwiXi,'&) with 

n n 

XwixA) = ^WjZjh{x,r^{x) +Yj -r^{Xj)}/^Zj 

3=1 3=1 

is efficient for Eh{X,Y) where h{X,Y) is a known square-integrable func- 
tion. The literature usually deals with estimation of the mean response, that 
is, h{x,y) = y. Other important examples are estimation of higher moments 
of the response variable Y and the estimation of the covariance and of mixed 
moments of X and Y. In all these cases h(x,y) is a polynomial in x and y 
and the estimator often simplifies. This holds for the mean response, and, 
more generally, when h is of the form h{x,y) = a{x)y. Then the estimator 
reduces to an unweighted empirical estimator, which can be seen as follows. 
Recall that the weights must be chosen such that J2]j=i ''^j^j^j — that 
Wj = 1 — XwjZjij which gives J2]=i ^j-^j/ Z]j=i — Hence the estimator 
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for E{a{X)Y} is 

1 " 

= -J2a{X,)r^iX,). 

1=1 

In these cases it is therefore not necessary to determine weights: the above 
intuitive estimator, with an efficient estimator for plugged in, is efficient 
for E{a{X)Y}. 

An interesting special case is estimation of the mean response, a{X) = 
1, when possibly all responses are observed, which we mentioned in the 
Introduction. Regardless of whether there are missing responses or not, 
J27=i ''^(^i) is efficient for EY, provided i? is efficient for t?. The differ- 
ence between the two situations is the construction of "(9, which will be based 
on either complete data pairs or on missing response data. Let us stay with 
this example and consider, for a comparison, the unweighted estimator (1.1) 
from the introduction, that is, with all weights equal to one. It involves the 
term X]j=i ■^j^i/Sj=i -^j which is nonzero. If all responses are observable, 
the unweighted estimator further simplifies, namely to 

-in 1 " 1 " 

n ^ n n ■f^ 

[whereas the weighted estimator is n~^Y^=\"'' ^{Xi)\- Its influence function 
is y — EY which is clearly not the efficient one: our efficient estimator for 
EY (with an efficient estimator ??) has the expansion 

1 n 1 " 

- ^ r^(X,) = -Y.r,{X,) + (^ - <i)EU{X). 

i=l i=l 

We recognize this as the expansion from Theorem 3.3 with = Er,^{X). 
Even without inserting the expansion for i? — ■!? from the previous section, it is 
clear that this is, in general, not the influence function of "127=1 which 
shows that it cannot be efficient. Note that J27=i ^ ^1^° coincides with 
the (inefficient) partially imputed estimator if all responses were observed. 

6.2. Simulations. For an illustration with computer simulations we con- 
sider a linear regression function, r^{X) = i)X with = 2, and a nonlinear 
regression function, r^{X) = cos(t?X), also with ■& = 2. The probabilities 
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tt{X) = P{Z = 1\X) = E{Z\X) are chosen as values of a logistic distribution 
function, 7r(X) = 1/(1 + e~-^), so that on average one half of the simulated 
responses are missing. We generate covariates X from a uniform distribu- 
tion on the interval (—1,1) and error variables £ from a standard normal 
distribution. If the errors are in fact normally distributed then l{e) = e/o"^ 
and the efficient one-step improvement estimator for •& from the previous 
section is asymptotically equivalent to the ordinary least squares estimator. 
The following considerations can therefore be based on this straightforward 
estimation approach. 

In a first example we consider estimation of the mean response EY and 
compare the efficient (fully imputed weighted) estimator, which, as seen 
above, here simplifies to Z^iLi ^^(^i)' with the partially imputed esti- 
mator J27=i{^i^i + (1 ~ We also study the performance of 
these estimators if the parameter estimates are replaced by their true values, 
and if all responses are observed, -7r(-) = 1. Further we calculate the first sim- 
ple estimator from the introduction, n~^J27=i ^i^i/^i^i)-, with, for reasons 
of simplicity, the estimated probabilities -n" replaced by the true ones. The 
values of the simulated mean squared errors are given in Table 1. 

In both the linear and the nonlinear regression models, the fully imputed 
estimator performs considerably better than the partially imputed estima- 
tor. The simple estimator in the last column is clearly outperformed by the 
imputation approaches. Comparing the columns for the fully imputed es- 
timator with and without parameter estimation (and analogously for the 
partially imputed estimator), we see that the estimator of the slope 'd in 
linear regression r^{X) ='&X is, as a plug-in estimator for estimating EY , 
better than the parameter estimator of the frequency parameter in the 
nonlinear regression model r^{X) = cos{{}X): in the linear regression model 
the mean squared errors of the approaches based on ?9 and '& are very simi- 
lar, in contrast to the nonlinear model where the differences are quite large. 
Let us also compare the (a) and (b) sections in the linear regression and 
the nonlinear regression example, which refer to the situation where (a) re- 
sponses are missing at random and (b) all responses are available. For the 
fully imputed estimator Yl^=i '''^{^i) we observe the expected improved 
performance when more (response) data for the estimation of are avail- 
able. The situation is different for the partially imputed estimator. Indeed 
we expect that, similarly, performance will improve as the proportion of 
observed responses increases. In this case improves as an estimator of "& 
but, at the same time, the partially imputed estimator will discard more 
and more information about the structure of the regression function. [In the 
extreme case 7r(-) = 1 it equals the empirical estimator n~^^^^iYi.] Our 
example demonstrates that both scenarios are possible: for the linear re- 
gression model the estimator of i) performs well and the simulated mean 
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Table 1 

Simulated mean squared errors of estimators of the mean response EY 



7r(X) 


n 


FI 


FI 


PI 


PI 


N 


Linear 


regression: r^[X) 


= '&X {■& = 2) 










1/(1 + 


e-^) 50 


0.027520 


0.026639 


0.036231 


0.036368 


0.104962 




100 


0.013502 


0.013298 


0.018074 


0.018364 


0.052680 




1000 


0.001328 


0.001325 


0.001794 


0.001835 


0.005270 


1 


50 


0.026990 


0.026639 


0.046322 


0.046322 


0.046322 




100 


0.013415 


0.013298 


0.023479 


0.023479 


0.023479 




1000 


0.001327 


0.001325 


0.002345 


0.002345 


0.002345 


Nonlinear regression: r^(X) = cos(i9X) 


(^ = 2) 








1/(1 + 


e-^) 50 


0.027858 


0.003957 


0.031163 


0.013272 


0.053038 




100 


0.015462 


0.002001 


0.017147 


0.007020 


0.028154 




1000 


0.001492 


0.000199 


0.001671 


0.000696 


0.002810 


1 


50 


0.016512 


0.003957 


0.023369 


0.023369 


0.023369 




100 


0.008581 


0.002001 


0.012043 


0.012043 


0.012043 




1000 


0.000852 


0.000199 


0.001207 


0.001207 


0.001207 


Notes. 


The table entries 


are the simulated mean 


squared errors 


of estimators of EY — 



Ers{X) with partially missing responses, 7r(X) = 1/(1 + e and completely observed 
data pairs, 7r(X ) = 1. In the first two columns we study the efficient fully imputed weighted 
estimator with the ordinary least squares estimator ■& plugged in (FI) and its corresponding 
version using the true parameter, '0 — 2 (FI). The next two columns refer to the partially 
imputed estimator using ■& (PI) and the version based on ^ — 2 (PI). The last column 
considers the simple estimator X/^^i ZiYi/Tv{Xi) (N), which does not use imputation. 
Note that in the sections with n{X) — 1 the columns for PI, PI and N are identical: since all 
the indicators are 1, these estimators coincide with the empirical estimator X/T^i 



squared error of the partially imputed estimator in (a) is smaller than in 
(b). In the nonlinear regression model the estimator of ■!? is not as good and 
the mean squared error in (a) is larger than the mean squared error of the 
empirical estimator in (b). Note that this observation about the performance 
of the partially imputed estimator is only of secondary interest since, in any 
case, the fully imputed estimator has the smaller mean squared error. 

The situation is slightly more complicated when h is of the form h{x, y) = 
a{x)h{y) with a nonlinear function 6, for example, when higher mixed mo- 
ments of X and Y or just higher moments of Y are estimated. Simpli- 
fied estimators are available when b has a simple form. For an illustra- 
tion we consider, in a second example, estimation of the second moment 
EY"^ = Er^{X)'^ + o"^. The fully imputed estimator is 

l^ j:]=iW,Z,{r^iX,)+i,}' , EUwjZ.ej 
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The mean square errors for the fully imputed and the partially imputed 
estimator (with and without parameter estimation) are given in Table 2. 

Consider the lower section on nonlinear regression first. We see that, as 
expected, the fully imputed estimator outperforms the partially imputed 
estimator, and that, in part (a) with missing responses, both estimators 
are far better than the simple estimator in the last column. Using an es- 
timator for "!?, or the true value = 2, does not have much impact on 
the mean squared error here. The upper half of Table 2 on linear regres- 
sion, however, shows a different picture: although the mean squared er- 
ror of the fully imputed and the partially imputed based on the true •& 
are considerably different (which is what we would expect) the values of 
the estimators based on the ordinary least squares parameter estimator 
•Q suggest that the two approaches are asymptotically equivalent. For the 
extreme case (b) where -7r(-) = 1 this would mean that the fully imputed 
estimator X^ILi + ^""^ SiLi ^'^d the empirical estimator 

Yll=i are asymptotically equivalent. This may be surprising but, 
in fact, it is easy to see that this is exactly what is happening: we con- 
sider the special example of linear regression with normal errors and the 
ordinary least squares estimator ^ = YA=i-^i^i/Y17=i^'i ■ Rewriting the 



Table 2 

Simulated mean squared errors of estimators of EY^ 



7r(X) 


n 


FI 


FI 


PI 


PI 


N 


Linear regression: r-^(X) 


= -&X = 2) 










1/(1+6"-') 


50 


0.312670 


0.116360 


0.310263 


0.161374 


0.528146 




100 


0.158512 


0.055343 


0.157402 


0.079863 


0.267601 




1000 


0.016215 


0.005470 


0.016189 


0.008113 


0.027298 


1 


50 


0.174683 


0.070048 


0.173817 


0.173817 


0.173817 




100 


0.088960 


0.034685 


0.088455 


0.088455 


0.088455 




1000 


0.008630 


0.003359 


0.008623 


0.008623 


0.008623 


Nonlinear regression: r^{X) = cos(i9X) 


(^ = 2) 








1/(1 + 6"^) 


50 


0.086350 


0.087286 


0.092361 


0.093401 


0.176124 




100 


0.042671 


0.042747 


0.047054 


0.047219 


0.092478 




1000 


0.004260 


0.004179 


0.005032 


0.004961 


0.010153 


1 


50 


0.043774 


0.043873 


0.066100 


0.066100 


0.066100 




100 


0.021578 


0.021574 


0.035573 


0.035573 


0.035573 




1000 


0.002159 


0.002116 


0.003713 


0.003713 


0.003713 



Notes. Here we study estimation of EY^ . The first two columns refer to the fully im- 
puted estimator with the ordinary least squares estimator tI} plugged in (FI) and to its 
version using 1} = 2 (FI). In the next two columns we consider the partially imputed es- 
timator based on i? (PI) and — 2 (PI). In the last column the mean squared errors of 
n-^ ELi ZiY,/TT{X,) (N) are listed. 
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empirical estimator gives n ^ Ya=i Yi = n ^ Ya=i "^^i^if + ^ Ya=i ^1 + 
n~^2i}Yll=i^i^i- The last term cancels for the least squares estimator i9 so 
that n-^ X;r=i = ELi r^{X,f + X;r=i ^1- Finally, by our results 
from Section 3, the estimators Yll=i Wiil and Z^iLi ^1 error 
variance cr^ are asymptotically equivalent. 

In the next example we restrict our attention to linear regression, r^(X) = 
i9 = 2, and consider estimation of a more complicated expectation, 
namely of Eh{X,Y) = E{Xe-^^). In contrast to the previous examples the 
(weighted) fully imputed estimator cannot be reduced. The mean squared 
errors of this estimator and of the partially imputed estimator are given in 
Table 3. For each estimator we study the two cases with and without param- 
eter estimation. Again we observe that the performance of the estimators is 
not much affected by the plug-in parameter estimator. Comparing the fully 
and the partially imputed estimators we see that the fully imputed estima- 
tor clearly outperforms the partially imputed estimator. In addition we also 
calculate the simulated mean squared error of the unweighted (inefficient) 
version of our fully imputed estimator. The performance of this estimator 
turns out to lie between the fully and the partially imputed one. In partic- 
ular, the simulations in section (b), where all data are observed and where 
the partially imputed estimator equals the empirical estimator, confirm our 
theoretical observation that incorporating the information about the loca- 
tion of the errors, for example in the form of weights as done in this article, 
is important. 

In order to study the behavior of the fully imputed estimator for multi- 
dimensional ?? we again studied estimation of E{Xe-^^). For our simulations 



Table 3 



Simulated 


mean squared 


errors of estimators c 


,fE{XeMXY)} 


in linear 


regression 


7r(X) 


n 


FI 


FI 


U 


PI 


PI 


1/(1 + 


50 


0.32563 


0.29024 


0.36187 


0.48164 


0.47769 




100 


0.15017 


0.14085 


0.18147 


0.24192 


0.24698 




1000 


0.01384 


0.0137 


0.01992 


0.02577 


0.02703 


1 


50 


0.28988 


0.27262 


0.32220 


0.58566 


0.58566 




100 


0.13804 


0.13413 


0.16520 


0.29948 


0.29948 




1000 


0.01332 


0.01329 


0.01663 


0.02997 


0.02997 



Notes. We consider estimation of Eh{X,Y) — E{Xe^^) in the linear regression model 
r^{X) = iDX, i9 = 2. The first two columns give the mean squared errors of the fully 
imputed estimator with the least squares estimator i9 plugged in (FI), and its version 
using i} = 2 (FI). The third column contains the mean squared errors of the unweighted 
version U of FI. The last two columns refer to the partially imputed estimator using i? 
(PI) and 1? = 2 (PI). Note that if tt{X) = 1 then the partially imputed estimator again 
equals the empirical estimator, PI — PI = Xiexp{XiYi). 
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Table 4 

Simulated mean squared errors of estimators of E{X exp{XY)} with i!}£W (p = 2,3) 





FI 


FI 


U 


PI 


PI 




0.2465 


0.2272 


0.2855 


0.3965 


0.4018 




0.3048 


0.2272 


0.3048 


0.4259 


0.4018 


■diX + ■d2U + ■daV'^ 


0.4367 


0.3750 


0.4434 


0.5696 


0.5760 



Notes. The three rows refer to two regression functions with different parametrizations. We 
have i9o = 0, i9i = 2, ^2 = -1 and = 0.5, n = 100, n{X) = 1/(1 + e~^). The covariates 
X, U and V are independent from a uniform distribution on ( — 1, 1). The parameters are 
estimated using least squares. The notation is explained in Table 3. 



Table 5 

Simulated mean squared errors of estimators of Eh(X, Y) with -d inefficient 









EY 


EY 


E{Xe 








r«(X) 


= cos(i9X) 






r^,iX)■. 




n(X) 


n 


FI 


PI 


FI 


PI 


FI 


PI 




50 


0.03124 


0.03545 


0.02742 


0.03868 


0.50275 


0.72944 




100 


0.01841 


0.02057 


0.01375 


0.01938 


0.24148 


0.48759 


1 


50 


0.02000 


0.02864 


0.02689 


0.05181 


0.41476 


0.79949 




100 


0.01016 


0.01448 


0.01359 


0.02589 


0.25796 


0.63103 



Notes. We compare fuUy and the partiaUy imputed estimators of EY and _B(Xe'^^), 
keeping the previous notation. Again, 1? is the least squares estimator, but now the errors 
are from a f-distribution with 10 degrees of freedom. 



we restricted our attention to missing data and on samples of size n = 100, 
and considered three different regression models which are given in Table 
4. Note that the second regression function, 'Sq + i!)iX + ■d2U with ??o = 0, 
"!?! = 2 and '(?2 = —1, equals the first one, namely 2X — U, but it involves a 
three-dimensional parameter. As expected, the increase of dimension impairs 
the performance of the fully imputed (weighted and unweighted) and of 
the partially imputed estimator. Note that the weighted and unweighted 
fully imputed estimator (FI and U) in the second regression model are the 
same: we consider the least squares estimator in a regression model with 
an intercept term 7?o- Iii this model the least squares estimator solves, by 
construction, J2]=i^j^j = (which implies that all weights wj equal one). 
Again we observe that the fully imputed estimator consistently outperforms 
the partially imputed estimator. 

We conclude this section with a small simulation study to examine the 
behavior of the fully imputed estimator when i? is inefficient. The simplest 
setting is to choose the ordinary least squares estimator, as we did before, 
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but in a model with non- normal errors. In Table 5 we consider estimation 
of the mean response and of E(Xe^^) for linear and nonlinear regression, 
and for errors from a t-distribution. The results are similar to the previous 
ones: again the fully imputed estimator performs best, though not as well 
as if the errors are, in fact, from a normal distribution (cf. Tables 1-3). 
Simulations with a logistic error density turned out similarly, confirming the 
better performance of the imputation method. At least in these examples, 
with moderate sample sizes n = 50 and n = 100, the construction of does 
not seem to be as important as the choice between the full and the partial 
imputation approaches. 

6.3. Confidence intervals. By Theorem 5.2 the fully imputed weighted 
estimator n~^J2i'=iXwiXi,'d) is asymptotically normally distributed, with 
asymptotic variance cj|j = Ex^{X, t?) + {EZ)-^E¥{e) - {l + {EZ)-^}E^h{X, 
Y)-E^{eh{e)}/{a^EZ) + DZE{ZCC)-^Dy, (see Theorem 5.2 for the nota- 
tion). An asymptotic confidence interval for Eh{X,Y) with confidence level 
1 — a is 

(1 " / ^2 ^ 

-J^XwiXi,^) - Za/2 V — ,-y'Xt«(-'^i,^) + ^;a/2 
n ^ ' \ n n ■f^ ' 

1=1 ' 1=1 

where denotes the upper a/2-quantile of the standard normal distribu- 
tion, and where iTpj is a consistent estimator of <Tpj. Consider, for example, 
estimation of EY with r^[X) depending on a scalar parameter which cov- 
ers our previous simple examples r^{X) = 'dX and r^{X) = cos(t?X). Here 
the confidence interval is n~^Yll=i'''^{Xi) ± Zq,/2('5'fi/'^)^^^- The asymptotic 
variance of Y2=i ''^^{Xi) is 

4l = Var,.(X)+ ^^f^o*^" 




EZ YaT{r^{X)\Z = l}E{£^{e)} ' 

The expectations in the formula can be estimated by empirical methods, 
with a consistent estimator iD for the parameter plugged in. Consider, for 
example, Y&i{r^{X)\Z = 1} = E{r^{Xf\Z = 1} - E'^{r^{X)\Z = 1}. The 
first expectation is estimated by {J27=i J27=i ^ii'^^S^-^^^^'^ ^ analo- 
gously the second one. 

In order to confirm the theoretical results we also performed some simu- 
lation studies, generating confidence intervals for the above examples with 
the described estimation method. As expected, for a = 0.05 we obtained the 
desired coverage probability 0.95. 

APPENDIX 

Lemma A.l. Suppose that Assumption 1 is satisfied. Then, for a ^Jn 
consistent estimator i? of t?, the statements (3.2)-(3.4) hold. 
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Proof. In order to prove (3.2)-(3.4) we first show 
(A.l) max \Ziii - ZiSil = Op(l), 

l<i<n 
n 

(A.2) ^Z,(e,-4)2 = Op(l) with r^(X,)^(i? -1?). 

i=l 

Result (A.2) immediately follows from the ^/n consistency of and the 
stochastic differentiability of r,^ [implication (2.1) of Assumption 1]: 

n n 

Zi{e^ - elf = J2 - {ei - ri){X^)^ - ^)}]' 

i=l 1=1 
n 

< Y.{r^{Xi) - r^{Xi) - r^(X0^(^9 - ^)}^ = Op(l). 

i=l 

This gives maxi<j<„ \Zi{ii — e*)| = Op(l). In order to establish (A.l) it there- 
fore suffices to show maxi<j<„ \ Zi{el — ei)\ = Op(l). We have 

max \Zi{e* — ej)| < max \e* — ei\ — "&{ • max |r^(Xj)|. 

l<i<n l<i<n l<i<n 

Since i? is ^/n consistent we only need maxi<j<„ |r^(Xj)| = Op(l). But 

this holds by Owen (2001), Lemma 11.2, since the variables |r^(Xj)|, i = 

I, ...,n, are i.i.d. and, by Assumption 1, have finite second moments. This 
shows maxi<i<„ \Zi{£* - ei)\ = Op{l). 

Equation (3.2), maxi<j<.„ {Ziiil = Op{v}^'^), can be seen as follows: we can 
bound maxi<i<„ by maxi<i<„ \Zi£i - Zjej| + maxi<j<„ \Ziei\. The first 
term is Op(l) by (A.l) and the second term is Op(n^/^) by Owen's Lemma 

II. 2 since the ZiSi are i.i.d. with finite variance. We now show (3.3), that is, 

in 1 " 

- V Zifi = - V Z,£, - EZE{r^ {X)\Z = iy0-^) + Op{n-^/^). 

n n 
1=1 1=1 

In view of (A.2), n'^ ^['=1 ^iii = n^^ E*=i ZiS* + Op(n~i/2). By the law of 

large numbers we obtain 

-in 1 ^ ^ 

- J2 Z^e* = -Y^Z,e,--Y^ ZMXiVi^ " ^) 
n ^ n n 

2 = 1 i = l 2=1 

1 

- ^ Z,e, - E{ZT^{X)y{i} - T?) + Op(p-i/2^ 



(n 

1=1 



Since E{Zr^(X)) = EZE{ri^{X)\Z = 1} we have established (3.3). 
Our last auxiliary result to prove is (3.4), 

-j^ n 1 " 

1=1 i=l 
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The second equality is just a consequence of the law of large numbers. To 
see that the first equation holds consider 

-.n 1™ 

n n n n 

i=\ 1=1 1=1 %=i 

The second term on the right-hand side is Op(l) by (A.l). To show that the 
first expression is Op(l) it suffices, in view of (A. 2), to consider 

-in 1 " 

n n 
1=1 1=1 

This term is Op(f) since is y/n consistent and since r^(X) is in L2{P). □ 
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